{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 第三题：神经网络：对数几率回归"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "实验内容：\n",
    "1. 完成对数几率回归\n",
    "2. 使用梯度下降求解模型参数\n",
    "3. 绘制模型损失值的变化曲线\n",
    "4. 调整学习率和迭代轮数，观察损失值曲线的变化\n",
    "5. 按照给定的学习率和迭代轮数，初始化新的参数，绘制新模型在训练集和测试集上损失值的变化曲线，完成表格内精度的填写"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "对数几率回归，二分类问题的分类算法，属于线性模型中的一种，我们可以将其抽象为最简单的神经网络。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<img src=\"https://davidham3.github.io/blog/2018/09/11/logistic-regression/Fig1.png\" ,width=300>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "只有一个输入层和一个输出层，还有一个激活函数，$\\rm sigmoid$，简记为$\\sigma$。  \n",
    "我们设输入为$X \\in \\mathbb{R}^{n \\times m}$，输入层到输出层的权重为$W \\in \\mathbb{R}^{m}$，偏置$b \\in \\mathbb{R}$。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 激活函数\n",
    "\n",
    "$$\n",
    "\\mathrm{sigmoid}(x) = \\frac{1}{1 + e^{-x}}\n",
    "$$\n",
    "\n",
    "这个激活函数，会将输出层的神经元的输出值转换为一个 $(0, 1)$ 区间内的数。\n",
    "\n",
    "因为是二分类问题，我们设类别为0和1，我们将输出值大于0.5的样本分为1类，输出值小于0.5的类分为0类。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 前向传播\n",
    "\n",
    "$$\n",
    "Z = XW + b\\\\\n",
    "\\hat{y} = \\sigma(Z)\n",
    "$$\n",
    "\n",
    "其中，$O \\in \\mathbb{R}^{n}$为输出层的结果，$\\sigma$为$\\rm sigmoid$激活函数。\n",
    "\n",
    "**注意：这里我们其实是做了广播，将$b$复制了$n-1$份后拼接成了维数为$n$的向量。**\n",
    "\n",
    "所以对数几率回归就可以写为：\n",
    "\n",
    "$$\n",
    "\\hat{y} = \\frac{1}{1 + e^{-XW + b}}\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 损失函数\n",
    "\n",
    "使用对数损失函数，因为对数损失函数较其他损失函数有更好的性质，感兴趣的同学可以去查相关的资料。 \n",
    "\n",
    "针对二分类问题的对数损失函数：\n",
    "\n",
    "$$\n",
    "\\mathrm{loss}(y, \\hat{y}) = - y \\log{\\hat{y}} - (1 - y) \\log{(1 - \\hat{y})}\n",
    "$$\n",
    "\n",
    "在这个对数几率回归中，我们的损失函数对所有样本取个平均值：\n",
    "\n",
    "$$\n",
    "\\mathrm{loss}(y, \\hat{y}) = - \\frac{1}{n} \\sum^n_{i = 1}[y_i \\log{\\hat{y_i}} + (1 - y_i) \\log{(1 - \\hat{y_i})}]\n",
    "$$\n",
    "\n",
    "**注意，这里我们的提到的$\\log$均为$\\ln$，在numpy中为**`np.log`。\n",
    "\n",
    "因为我们的类别只有0和1，所以在这个对数损失函数中，要么前一项为0，要么后一项为0。\n",
    "\n",
    "如果当前样本的类别为0，那么前一项就为0，损失函数变为 $- \\log{(1 - \\hat{y})}$ ，因为我们的预测值 $0 < \\hat{y} < 1$ ，所以 $0 < 1 - \\hat{y} < 1$ ，$- \\log{(1 - \\hat{y})} > 0$ ，为了降低损失值，模型需要让预测值 $\\hat{y}$不断地趋于0。\n",
    "\n",
    "同理，如果当前样本的类别为1，那么降低损失值就可以使模型的预测值趋于1。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 参数更新\n",
    "\n",
    "求得损失函数对参数的偏导数后，我们就可以使用**梯度下降**进行参数更新：\n",
    "\n",
    "$$\n",
    "W := W - \\alpha \\frac{\\partial \\mathrm{loss}}{\\partial W}\\\\\n",
    "b := b - \\alpha \\frac{\\partial \\mathrm{loss}}{\\partial b}\n",
    "$$\n",
    "\n",
    "其中，$\\alpha$ 是学习率，一般设置为0.1，0.01等。\n",
    "\n",
    "经过**一定次数**的迭代后，参数会收敛至最优点。这种基于梯度的优化算法很常用，训练神经网络主要使用这类优化算法。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 反向传播\n",
    "\n",
    "我们使用梯度下降更新参数$W$和$b$。为此需要求得损失函数对参数$W$和$b$的偏导数，根据链式法则有：\n",
    "\n",
    "$$\\begin{aligned}\n",
    "\\frac{\\partial \\mathrm{loss}}{\\partial W} &= \\frac{\\partial \\mathrm{loss}}{\\partial \\hat{y}} \\frac{\\partial \\hat{y}}{\\partial Z} \\frac{\\partial Z}{\\partial W}\n",
    "\\end{aligned}\n",
    "$$\n",
    "\n",
    "这里我们一项一项求，先求第一项：\n",
    "\n",
    "$$\\begin{aligned}\n",
    "\\frac{\\partial \\mathrm{loss}}{\\partial \\hat{y}} = - \\frac{1}{n} \\sum^n_{i = 1} [\\frac{y}{\\hat{y}} - \\frac{1 - y}{1 - \\hat{y}}]\n",
    "\\end{aligned}\n",
    "$$\n",
    "\n",
    "第二项：\n",
    "\n",
    "$$\\begin{aligned}\n",
    "\\frac{\\partial \\hat{y}}{\\partial Z} & = \\frac{\\partial (\\frac{1}{1 + e^{-Z}})}{\\partial Z}\\\\\n",
    "& = \\frac{e^{-Z}}{(1 + e^{-Z})^2}\\\\\n",
    "& = \\frac{e^{-Z}}{(1 + e^{-Z})} \\frac{1}{(1 + e^{-Z})}\\\\\n",
    "& = \\frac{e^{-Z}}{(1 + e^{-Z})} (1 - \\frac{e^{-Z}}{(1 + e^{-Z})})\\\\\n",
    "& = \\sigma(Z)(1 - \\sigma(Z))\n",
    "\\end{aligned}\n",
    "$$\n",
    "\n",
    "第三项：\n",
    "\n",
    "$$\n",
    "\\frac{\\partial Z}{\\partial W} = X^{\\mathrm{T}}\n",
    "$$\n",
    "\n",
    "综上：\n",
    "\n",
    "$$\\begin{aligned}\n",
    "\\frac{\\partial \\mathrm{loss}}{\\partial W} &= \\frac{\\partial \\mathrm{loss}}{\\partial \\hat{y}} \\frac{\\partial \\hat{y}}{\\partial Z} \\frac{\\partial Z}{\\partial W}\\\\\n",
    "&= - \\frac{1}{n} \\sum^n_{i = 1} [\\frac{y_i}{\\hat{y_i}} - \\frac{1 - y_i}{1 - \\hat{y_i}}] [\\sigma(Z_i)(1 - \\sigma(Z_i))] {X_i}^{\\mathrm{T}}\\\\\n",
    "&= - \\frac{1}{n} \\sum^n_{i = 1} [\\frac{y_i}{\\hat{y_i}} - \\frac{1 - y_i}{1 - \\hat{y_i}}] [\\hat{y_i}(1 - \\hat{y_i})] {X_i}^{\\mathrm{T}}\\\\\n",
    "&= - \\frac{1}{n} \\sum^n_{i = 1} [y_i(1 - \\hat{y_i}) - \\hat{y_i}(1 - y_i)] {X_i}^{\\mathrm{T}}\\\\\n",
    "&= - \\frac{1}{n} \\sum^n_{i = 1} (y_i - y_i \\hat{y_i} - \\hat{y_i} + y_i \\hat{y_i}) {X_i}^{\\mathrm{T}}\\\\\n",
    "&= - \\frac{1}{n} \\sum^n_{i = 1} (y_i - \\hat{y_i}) {X_i}^{\\mathrm{T}}\\\\\n",
    "&= \\frac{1}{n} [X^{\\mathrm{T}}(\\hat{y} - y)]\n",
    "\\end{aligned}\n",
    "$$\n",
    "\n",
    "同理，求$\\rm loss$对$b$的偏导数：\n",
    "\n",
    "**注意，由于$b$是被广播成$n \\times K$的矩阵，因此实际上$b$对每个样本的损失都有贡献，因此对其求偏导时，要把$n$个样本对它的偏导数加和。**\n",
    "\n",
    "$$\\begin{aligned}\n",
    "\\frac{\\partial \\mathrm{loss}}{\\partial b} &= \\frac{\\partial \\mathrm{loss}}{\\partial \\hat{y}} \\frac{\\partial \\hat{y}}{\\partial Z} \\frac{\\partial Z}{\\partial b}\\\\\n",
    "&= - \\frac{1}{n} \\sum^n_{i = 1} [\\frac{y_i}{\\hat{y_i}} - \\frac{1 - y_i}{1 - \\hat{y_i}}] [\\sigma(Z_i)(1 - \\sigma(Z_i))]\\\\\n",
    "&= - \\frac{1}{n} \\sum^n_{i = 1} [\\frac{y_i}{\\hat{y_i}} - \\frac{1 - y_i}{1 - \\hat{y_i}}] [\\hat{y_i}(1 - \\hat{y_i})]\\\\\n",
    "&= - \\frac{1}{n} \\sum^n_{i = 1} [y_i(1 - \\hat{y_i}) - \\hat{y_i}(1 - y_i)]\\\\\n",
    "&= - \\frac{1}{n} \\sum^n_{i = 1} (y_i - y_i \\hat{y_i} - \\hat{y_i} + y_i \\hat{y_i})\\\\\n",
    "&= \\frac{1}{n} \\sum^n_{i = 1} (\\hat{y_i} - y_i)\\\\\n",
    "\\end{aligned}$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "这样，我们就得到了损失函数对参数的偏导数，然后就可以使用梯度下降算法更新参数"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1. 导入数据集"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "%matplotlib inline\n",
    "from matplotlib.colors import ListedColormap"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "我们生成半月形数据"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.datasets import make_moons\n",
    "X, y = make_moons(n_samples = 2000, noise = 0.3, random_state=0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.figure(figsize = (6, 6))\n",
    "cm_bright = ListedColormap(['#FF0000', '#0000FF'])\n",
    "plt.scatter(X[:, 0], X[:, 1], c = y, cmap = cm_bright, edgecolors = 'k')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "选择40%的数据作为测试集，60%作为训练集"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.model_selection import train_test_split\n",
    "trainX, testX, trainY, testY = train_test_split(X, y, test_size = 0.4, random_state = 32)\n",
    "trainY = trainY\n",
    "testY = testY"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "trainX.shape, trainY.shape, testX.shape, testY.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. 数据预处理"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "使用和第一题一样的预处理方式"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.preprocessing import StandardScaler\n",
    "s = StandardScaler()\n",
    "trainX = s.fit_transform(trainX)\n",
    "testX = s.transform(testX)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3. 定义神经网络"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.1 参数初始化"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "我们需要对神经网络的参数进行初始化，这个网络中只有两个参数，一个$W \\in \\mathbb{R}^{m}$，一个$b \\in \\mathbb{R}$。初始化的时候，我们将参数W随机初始化，参数b初始化为0。为什么要对神经网络的参数进行随机初始化，感兴趣的同学可以去查相关的资料。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def initialize(m):\n",
    "    '''\n",
    "    初始化参数W和参数b\n",
    "    \n",
    "    Returns\n",
    "    ----------\n",
    "    W: np.ndarray, shape = (m, )，参数W\n",
    "    \n",
    "    b: np.ndarray, shape = (1, )，参数b\n",
    "    \n",
    "    '''\n",
    "    np.random.seed(32)\n",
    "    W = np.random.normal(size = (m, )) * 0.01\n",
    "    b = np.zeros((1, ))\n",
    "    return W, b"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 测试样例\n",
    "Wt, bt = initialize(trainX.shape[1])\n",
    "print(Wt.shape) # (2,)\n",
    "print(bt.shape) # (1,)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.2 前向传播"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "接下来我们要定义神经网络前向传播的过程。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "首先计算$Z = XW + b$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def linear_combination(X, W, b):\n",
    "    '''\n",
    "    完成Z = XW + b的计算\n",
    "    \n",
    "    Parameters\n",
    "    ----------\n",
    "    X: np.ndarray, shape = (n, m)，输入的数据\n",
    "    \n",
    "    W: np.ndarray, shape = (m, )，权重\n",
    "    \n",
    "    b: np.ndarray, shape = (1, )，偏置\n",
    "    \n",
    "    Returns\n",
    "    ----------\n",
    "    Z: np.ndarray, shape = (n, )，线性组合后的值\n",
    "    \n",
    "    '''\n",
    "    \n",
    "    # YOUR CODE HERE\n",
    "    \n",
    "    return Z"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 测试样例\n",
    "Wt, bt = initialize(trainX.shape[1])\n",
    "linear_combination(trainX, Wt, bt).shape #(1200,)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "接下来实现激活函数$\\rm sigmoid$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def my_sigmoid(x):\n",
    "    '''\n",
    "    simgoid 1 / (1 + exp(-x))\n",
    "    \n",
    "    Parameters\n",
    "    ----------\n",
    "    X: np.ndarray, 待激活的值\n",
    "    \n",
    "    '''\n",
    "    # YOUR CODE HERE\n",
    "    \n",
    "    return activations"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 测试样例\n",
    "Wt, bt = initialize(trainX.shape[1])\n",
    "Zt = linear_combination(trainX, Wt, bt)\n",
    "my_sigmoid(Zt).mean() # 0.49999"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "在实现$\\rm sigmoid$的时候，可能会遇到上溢(overflow)的问题，可以看到$\\rm sigmoid$中有一个指数运算\n",
    "$$\n",
    "\\mathrm{sigmoid}(x) = \\frac{1}{1 + e^{-x}}\n",
    "$$\n",
    "当$x$很大的时候，我们使用`numpy.exp(x)`会直接溢出"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "np.exp(1e56)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "my_sigmoid(np.array([-1e56]))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "虽说程序没有报错，只是抛出了warning，但还是应该解决一下。\n",
    "\n",
    "解决这种问题的方法有很多，比如，我们可以将$\\rm sigmoid$进行变换：\n",
    "\n",
    "$$\\begin{aligned}\n",
    "\\mathrm{sigmoid}(x) &= \\frac{1}{1 + e^{-x}}\\\\\n",
    "&= \\frac{e^x}{1 + e^x}\\\\\n",
    "&= \\frac{1}{2} + \\frac{1}{2} \\mathrm{tanh}(\\frac{x}{2})\n",
    "\\end{aligned}$$\n",
    "\n",
    "其中，$\\mathrm{tanh}(x) = \\frac{\\mathrm{sinh}(x)}{\\mathrm{cosh}(x)} = \\frac{e^x - e^{-x}}{e^x + e^{-x}}$\n",
    "\n",
    "转换成这种形式后，我们就可以直接利用`numpy.tanh`完成$\\rm sigmoid$的计算，就不会产生上溢的问题了。\n",
    "\n",
    "除此以外，最好的解决方法是使用scipy中的`expit`函数，完成$\\rm sigmoid$的计算。我们现在做的都是神经网络底层相关的运算，很容易出现数值不稳定性相关的问题，最好的办法就是使用别人已经实现好的函数，这样就能减少我们很多的工作量，同时又快速地完成任务。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from scipy.special import expit"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def sigmoid(X):\n",
    "    return expit(X)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 测试样例\n",
    "sigmoid(np.array([-1e56]))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "接下来完成整个前向传播的函数，也就是 $Z = XW+b$ 和 $\\hat{y} = \\mathrm{sigmoid}(Z)$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def forward(X, W, b):\n",
    "    '''\n",
    "    完成输入矩阵X到最后激活后的预测值y_pred的计算过程\n",
    "    \n",
    "    Parameters\n",
    "    ----------\n",
    "    X: np.ndarray, shape = (n, m)，数据，一行一个样本，一列一个特征\n",
    "    \n",
    "    W: np.ndarray, shape = (m, )，权重\n",
    "    \n",
    "    b: np.ndarray, shape = (1, )，偏置\n",
    "    \n",
    "    Returns\n",
    "    ----------\n",
    "    y_pred: np.ndarray, shape = (n, )，模型对每个样本的预测值\n",
    "    \n",
    "    '''\n",
    "    # 求Z\n",
    "    # YOUR CODE HERE\n",
    "    \n",
    "    # 求激活后的预测值\n",
    "    # YOUR CODE HERE\n",
    "    \n",
    "    return y_pred"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 测试样例\n",
    "Wt, bt = initialize(trainX.shape[1])\n",
    "forward(trainX, Wt, bt).mean() # 0.4999(没有四舍五入)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "接下来完成损失函数的编写，我们使用的是对数损失，这里需要注意的一个问题是：\n",
    "\n",
    "$$\n",
    "\\mathrm{loss}(y, \\hat{y}) = - \\frac{1}{n}[ y \\log{\\hat{y}} + (1 - y) \\log{(1 - \\hat{y})}]\n",
    "$$\n",
    "\n",
    "在这个对数损失中，$\\hat{y}$中不能有$0$和$1$，如果有$0$，那么损失函数中的前半部分，$\\log{0}$就会出错，如果有$1$，那么后半部分$\\log{(1-1)}$就会出错。\n",
    "\n",
    "所以我们要先将$\\hat{y}$中的$0$和$1$改变一下，把$0$变成一个比较小但是大于$0$的数，把$1$变成小于$1$但是足够大的数。使用`numpy.clip`函数就可以作到这点。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def logloss(y_true, y_pred):\n",
    "    '''\n",
    "    给定真值y，预测值y_hat，计算对数损失并返回\n",
    "    \n",
    "    Parameters\n",
    "    ----------\n",
    "    y_true: np.ndarray, shape = (n, ), 真值\n",
    "    \n",
    "    y_pred: np.ndarray, shape = (n, )，预测值\n",
    "    \n",
    "    Returns\n",
    "    ----------\n",
    "    loss: float, 损失值\n",
    "    \n",
    "    '''\n",
    "    # 下面这句话会把y_pred里面小于1e-10的数变成1e-10，大于1 - 1e-10的数变成1 - 1e-10\n",
    "    y_hat = np.clip(y_pred, 1e-10, 1 - 1e-10)\n",
    "    \n",
    "    # 求解对数损失\n",
    "    loss = -np.mean(y_true * np.log(y_pred) + (1 - y_true) * np.log(1 - y_pred))                                     # YOUR CODE HERE\n",
    "    \n",
    "    return loss"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 测试样例\n",
    "Wt, bt = initialize(trainX.shape[1])\n",
    "logloss(trainY, forward(trainX, Wt, bt)) # 0.69740"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.3 反向传播"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "我们接下来要完成损失函数对参数的偏导数的计算"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def compute_gradient(y_true, y_pred, X):\n",
    "    '''\n",
    "    给定预测值y_pred，真值y_true，传入的输入数据X，计算损失函数对参数W的偏导数的导数值dW，以及对b的偏导数的导数值db\n",
    "    \n",
    "    Parameters\n",
    "    ----------\n",
    "    y_true: np.ndarray, shape = (n, ), 真值\n",
    "    \n",
    "    y_pred: np.ndarray, shape = (n, )，预测值\n",
    "    \n",
    "    X: np.ndarray, shape = (n, m)，数据，一行一个样本，一列一个特征\n",
    "    \n",
    "    Returns\n",
    "    ----------\n",
    "    dW: np.ndarray, shape = (m, ), 损失函数对参数W的偏导数\n",
    "    \n",
    "    db: float, 损失函数对参数b的偏导数\n",
    "    \n",
    "    '''\n",
    "    # 求损失函数对参数W的偏导数的导数值\n",
    "    # YOUR CODE HERE\n",
    "    \n",
    "    # 求损失函数对参数b的偏导数的导数值\n",
    "    # YOUR CODE HERE\n",
    "    \n",
    "    return dW, db"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 测试样例\n",
    "Wt, bt = initialize(trainX.shape[1])\n",
    "dWt, dbt = compute_gradient(trainY, forward(trainX, Wt, bt), trainX)\n",
    "print(dWt.shape) # (2, )\n",
    "print(dWt.sum()) # 0.04625\n",
    "print(dbt)       # 0.00999"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.4 参数更新\n",
    "给定学习率，结合上一步求出的偏导数，完成梯度下降的更新公式"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def update(W, b, dW, db, learning_rate):\n",
    "    '''\n",
    "    梯度下降，给定参数W，参数b，以及损失函数对他们的偏导数，使用梯度下降更新参数W和参数b\n",
    "    \n",
    "    Parameters\n",
    "    ----------\n",
    "    W: np.ndarray, shape = (m, )，参数W\n",
    "    \n",
    "    b: np.ndarray, shape = (1, )，参数b\n",
    "    \n",
    "    dW: np.ndarray, shape = (m, ), 损失函数对参数W的偏导数\n",
    "    \n",
    "    db: float, 损失函数对参数b的偏导数\n",
    "    \n",
    "    learning_rate, float，学习率\n",
    "    \n",
    "    '''\n",
    "    # 对参数W进行更新\n",
    "    W -= learning_rate * dW\n",
    "    \n",
    "    # 对参数b进行更新\n",
    "    # YOUR CODE HERE"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 测试样例\n",
    "Wt, bt = initialize(trainX.shape[1])\n",
    "print(Wt)  # [-0.00348894  0.00983703]\n",
    "print(bt)  # [ 0.]\n",
    "print()\n",
    "\n",
    "dWt, dbt = compute_gradient(trainY, forward(trainX, Wt, bt), trainX)\n",
    "print(dWt) # [-0.28650366  0.33276308]\n",
    "print(dbt) # 0.00999999939463\n",
    "print()\n",
    "\n",
    "update(Wt, bt, dWt, dbt, 0.01)\n",
    "print(Wt)  # [-0.00062391  0.0065094 ]\n",
    "print(bt)  # [ -9.99999939e-05]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "我们来完成整个反向传播和更新参数的函数"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def backward(y_true, y_pred, X, W, b, learning_rate):\n",
    "    '''\n",
    "    反向传播，包含了计算损失函数对各个参数的偏导数的过程，以及梯度下降更新参数的过程\n",
    "    \n",
    "    Parameters\n",
    "    ----------\n",
    "    y_true: np.ndarray, shape = (n, ), 真值\n",
    "    \n",
    "    y_pred: np.ndarray, shape = (n, )，预测值\n",
    "    \n",
    "    X: np.ndarray, shape = (n, m)，数据，一行一个样本，一列一个特征\n",
    "    \n",
    "    W: np.ndarray, shape = (m, )，参数W\n",
    "    \n",
    "    b: np.ndarray, shape = (1, )，参数b\n",
    "    \n",
    "    dW: np.ndarray, shape = (m, ), 损失函数对参数W的偏导数\n",
    "    \n",
    "    db: float, 损失函数对参数b的偏导数\n",
    "    \n",
    "    learning_rate, float，学习率\n",
    "    \n",
    "    '''\n",
    "    # 求参数W和参数b的梯度\n",
    "    dW, db = compute_gradient(y_true, y_pred, X)\n",
    "    \n",
    "    # 梯度下降\n",
    "    update(W, b, dW, db, learning_rate)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 测试样例\n",
    "Wt, bt = initialize(trainX.shape[1])\n",
    "y_predt = forward(trainX, Wt, bt)\n",
    "loss_1 = logloss(trainY, y_predt)\n",
    "print(loss_1)                             # 0.697403529518\n",
    "\n",
    "backward(trainY, y_predt, trainX, Wt, bt, 0.01)\n",
    "\n",
    "y_predt = forward(trainX, Wt, bt)\n",
    "loss_2 = logloss(trainY, y_predt)\n",
    "print(loss_2)                             # 0.695477626714"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4. 训练函数的编写"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "我们已经实现了完成训练需要的子函数，接下来就是组装了"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def train(trainX, trainY, testX, testY, W, b, epochs, learning_rate = 0.01, verbose = False):\n",
    "    '''\n",
    "    训练，我们要迭代epochs次，每次迭代的过程中，做一次前向传播和一次反向传播\n",
    "    同时记录训练集和测试集上的损失值，后面画图用\n",
    "    \n",
    "    Parameters\n",
    "    ----------\n",
    "    trainX: np.ndarray, shape = (n, m), 训练集\n",
    "    \n",
    "    trainY: np.ndarray, shape = (n, ), 训练集标记\n",
    "    \n",
    "    testX: np.ndarray, shape = (n_test, m)，测试集\n",
    "    \n",
    "    testY: np.ndarray, shape = (n_test, )，测试集的标记\n",
    "    \n",
    "    W: np.ndarray, shape = (m, )，参数W\n",
    "    \n",
    "    b: np.ndarray, shape = (1, )，参数b\n",
    "    \n",
    "    epochs: int, 要迭代的轮数\n",
    "    \n",
    "    learning_rate: float, default 0.01，学习率\n",
    "    \n",
    "    verbose: boolean, default False，是否打印损失值\n",
    "    \n",
    "    Returns\n",
    "    ----------\n",
    "    training_loss_list: list(float)，每迭代一次之后，训练集上的损失值\n",
    "    \n",
    "    testing_loss_list: list(float)，每迭代一次之后，测试集上的损失值\n",
    "    \n",
    "    '''\n",
    "    \n",
    "    training_loss_list = []\n",
    "    testing_loss_list = []\n",
    "    \n",
    "    for i in range(epochs):\n",
    "        \n",
    "        # 计算训练集前向传播得到的预测值\n",
    "        # YOUR CODE HERE\n",
    "\n",
    "        # 计算当前训练集的损失值\n",
    "        # YOUR CODE HERE\n",
    "        \n",
    "        # 计算测试集前向传播得到的预测值\n",
    "        # YOUR CODE HERE\n",
    "        \n",
    "        # 计算当前测试集的损失值\n",
    "        # YOUR CODE HERE\n",
    "        \n",
    "        if verbose == True:\n",
    "            print('epoch %s, training loss:%s'%(i + 1, training_loss))\n",
    "            print('epoch %s, testing loss:%s'%(i + 1, testing_loss))\n",
    "            print()\n",
    "        \n",
    "        # 保存损失值\n",
    "        training_loss_list.append(training_loss)\n",
    "        testing_loss_list.append(testing_loss)\n",
    "        \n",
    "        # 反向传播更新参数\n",
    "        # YOUR CODE HERE\n",
    "        backward(trainY, train_y_pred, trainX, W, b, learning_rate)\n",
    "\n",
    "    \n",
    "    return training_loss_list, testing_loss_list"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 测试样例\n",
    "Wt, bt = initialize(trainX.shape[1])\n",
    "training_loss_list, testing_loss_list = train(trainX, trainY, testX, testY, Wt, bt, 2, 0.1)\n",
    "print(training_loss_list)  # [0.69740352951773121, 0.67843729060725722]\n",
    "print(testing_loss_list)   # [0.69743661286103986, 0.67880126235588389]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 5. 绘制模型损失值变化曲线"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def plot_loss_curve(training_loss_list, testing_loss_list):\n",
    "    '''\n",
    "    绘制损失值变化曲线\n",
    "    \n",
    "    Parameters\n",
    "    ----------\n",
    "    training_loss_list: list(float)，每迭代一次之后，训练集上的损失值\n",
    "    \n",
    "    testing_loss_list: list(float)，每迭代一次之后，测试集上的损失值\n",
    "    \n",
    "    '''\n",
    "    plt.figure(figsize = (10, 6))\n",
    "    plt.plot(training_loss_list, label = 'training loss')\n",
    "    plt.plot(testing_loss_list, label = 'testing loss')\n",
    "    plt.xlabel('epoch')\n",
    "    plt.ylabel('loss')\n",
    "    plt.legend()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 6. 预测"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "接下来编写一个预测的函数，事实上，$\\rm sigmoid$输出的是当前这个样本为正例的概率，也就是说，这个输出值是一个0到1的值，一般我们将大于0.5的值变成1，小于0.5的值变成0，也就是说，如果当前输出的概率值大于0.5，那我们认为这个样本的类别就是1，否则就是0，这样输出的就是类标了。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def predict(X, W, b):\n",
    "    '''\n",
    "    预测，调用forward函数完成神经网络对输入X的计算，然后完成类别的划分，大于0.5的变为1，小于等于0.5的变为0\n",
    "    \n",
    "    Parameters\n",
    "    ----------\n",
    "    X: np.ndarray, shape = (n, m), 训练集\n",
    "    \n",
    "    W: np.ndarray, shape = (m, 1)，参数W\n",
    "    \n",
    "    b: np.ndarray, shape = (1, )，参数b\n",
    "    \n",
    "    Returns\n",
    "    ----------\n",
    "    prediction: np.ndarray, shape = (n, 1)，预测的标记\n",
    "    \n",
    "    '''\n",
    "    \n",
    "    # YOUR CODE HERE\n",
    "    return prediction"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 测试样例\n",
    "from sklearn.metrics import accuracy_score\n",
    "Wt, bt = initialize(trainX.shape[1])\n",
    "predictiont = predict(testX, Wt, bt)\n",
    "accuracy_score(testY, predictiont)  # 0.16250000000000001"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 7. 训练一个神经网络"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "我们的学习率是0.01，迭代200轮"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "W, b = initialize(trainX.shape[1])\n",
    "training_loss_list, testing_loss_list = train(trainX, trainY, testX, testY, W, b, 200, 0.01)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "计算测试集精度"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "prediction = predict(testX, W, b)\n",
    "accuracy_score(testY, prediction)  # 0.83625000000000005"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "绘制损失值变化曲线"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plot_loss_curve(training_loss_list, testing_loss_list)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 课后部分：初始化新的参数，学习率和迭代轮数按下表设置，绘制其训练集和测试集损失值的变化曲线，完成表格内精度的填写，并结合参数学习率对曲线变化及训练过程进行分析"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "###### 双击此处填写\n",
    "\n",
    "学习率|迭代轮数|测试集精度\n",
    "-|-|-\n",
    "0.0001|200|\n",
    "0.1|1000|"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# YOUR CODE HERE\n",
    "\n",
    "\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# YOUR CODE HERE\n",
    "\n",
    "\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
